A New Data Stream Mining Algorithm for Interestingness-rich Association Rules
نویسنده
چکیده
Frequent itemset mining and association rule generation is a challenging task in data stream. Even though, various algorithms have been proposed to solve the issue, it has been found out that only frequency does not decides the significance interestingness of the mined itemset and hence the association rules. This accelerates the algorithms to mine the association rules based on utility i.e. proficiency of the mined rules. However, fewer algorithms exist in the literature to deal with the utility as most of them deals with reducing the complexity in frequent itemset/association rules mining algorithm. Also, those few algorithms consider only the overall utility of the association rules and not the consistency of the rules throughout a defined number of periods. To solve this issue, in this paper, an enhanced association rule mining algorithm is proposed. The algorithm introduces new weightage validation in the conventional association rule mining algorithms to validate the utility and its consistency in the mined association rules. The utility is validated by the integrated calculation of the cost/price efficiency of the itemsets and its frequency. The consistency validation is performed at every defined number of windows using the probability distribution function, assuming that the weights are normally distributed. Hence, validated and the obtained rules are frequent and utility efficient and their interestingness are distributed throughout the entire time period. The algorithm is implemented and the resultant rules are compared against the rules that can be obtained from conventional mining algorithms.
منابع مشابه
Numeric Multi-Objective Rule Mining Using Simulated Annealing Algorithm
Abstract as a single objective one. Measures like support, confidence and other interestingness criteria which are used for evaluating a rule, can be thought of as different objectives of association rule mining problem. Support count is the number of records, which satisfies all the conditions that exist in the rule. This objective represents the accuracy of the rules extracted from the da...
متن کاملA new approach based on data envelopment analysis with double frontiers for ranking the discovered rules from data mining
Data envelopment analysis (DEA) is a relatively new data oriented approach to evaluate performance of a set of peer entities called decision-making units (DMUs) that convert multiple inputs into multiple outputs. Within a relative limited period, DEA has been converted into a strong quantitative and analytical tool to measure and evaluate performance. In an article written by Toloo et al. (2009...
متن کاملModeling interestingness of streaming association rules as a benefit-maximizing classification problem
0950-7051/$ see front matter 2008 Elsevier B.V. A doi:10.1016/j.knosys.2008.07.003 q The authors gratefully acknowledge the TUBITA Research Council of Turkey) for providing funds to Grants 101E044 and 105E065. * Corresponding author. E-mail address: [email protected] (T. Aydın) In a typical application of association rule learning from market basket data, a set of transactions for a fixe...
متن کاملAn Efficient Algorithm for Mining Sequential Rules with Interestingness Measures
Mining sequential rules are an important problem in data mining research. It is commonly used for market decisions, management and behaviour analysis. In traditional association-rule mining, rule interestingness measures such as confidence are used for determining relevant knowledge. They can reduce the size of the search space and select useful or interesting rules from the set of the discover...
متن کاملReduction of Number of Association Rules with Inter Itemset Distance in Transaction Databases
Association Rule discovery has been an important problem of investigation in knowledge discovery and data mining. An association rule describes associations among the sets of items which occur together in transactions of databases.The Association Rule mining task consists of finding the frequent itemsets and the rules in the form of conditional implications with respect to some prespecified thr...
متن کامل